Lim Zhi Chao (A0252895N)¶

Link to GitHub repository: https://github.com/lzc88/DSA4262

NOTE: In order to run the entire notebook, you will require an API key from data.gov.sg

Setting Up¶

Import Dependencies¶

In [34]:
import os
import json
import time
import requests
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
from dotenv import load_dotenv

pio.renderers.default = "notebook_connected"
load_dotenv("../.env")
Out[34]:
True

Helper Functions¶

In [35]:
def gho_data_with_details(code):

    url = (
        "https://ghoapi.azureedge.net/api/DIMENSION/GHO/DimensionValues"
        f"?$filter=Code eq '{code}'"
    )

    r = requests.get(url, timeout=30)
    r.raise_for_status()

    data = r.json()["value"][0]
    print(f"Dataset {data['Code']}: {data['Title']}")

    url = f"https://ghoapi.azureedge.net/api/{code}" "?$filter=SpatialDim eq 'SGP'"

    r = requests.get(url, timeout=30)
    r.raise_for_status()

    data = r.json()["value"]

    # Uncomment to view sample data point

    # print(f"{len(data)} data points")

    # sample = data[0]

    # print("Sample data with the following fields:")

    # for idx, k in enumerate(sample):
    #     print(f"[{idx+1:02d}] {k}: {sample[k]}")

    return data


def fetch_with_retry(url, headers, max_retries=3):

    for attempt in range(max_retries):

        r = requests.get(url, headers=headers)

        if r.status_code == 429:
            wait = min(60, 2**attempt)
            print(f"429 rate limit. Sleep {wait}s then retry...")
            time.sleep(wait)
            continue

        r.raise_for_status()

        return r.json()

    raise RuntimeError("Too many 429s; try again later.")


def _flatten_coords(coords):

    if not coords:
        return

    if isinstance(coords[0][0][0], (int, float)):
        polys = [coords]
    else:
        polys = coords

    for poly in polys:
        for ring in poly:
            for lon, lat in ring:
                yield lon, lat


def plot_geojson(geo_data, df, zoom=10.5, title="Insert Title"):

    fig = go.Figure(
        go.Choroplethmap(
            geojson=geo_data,
            locations=df["area_key"],
            z=df["number"],
            featureidkey="properties.area_key",
            colorscale="Reds",
            marker_line_width=0.5,
            colorbar_title="Population",
        )
    )

    lons, lats = [], []
    for feat in geo_data["features"]:
        geom = feat["geometry"]
        coords = geom["coordinates"]
        for lon, lat in _flatten_coords(coords):
            lons.append(lon)
            lats.append(lat)

    min_lon, min_lat, max_lon, max_lat = min(lons), min(lats), max(lons), max(lats)

    fig.update_layout(
        map=dict(
            style="carto-positron",
            center={"lat": (min_lat + max_lat) / 2, "lon": (min_lon + max_lon) / 2},
            zoom=zoom,
        ),
        margin={"r": 0, "t": 40, "l": 0, "b": 0},
        title=title,
    )

    fig.show()

Topic: Suicide Risks in Singapore¶

While Singapore’s overall suicide rate has declined over the past two decades, this aggregate trend masks persistent and uneven risks across demographic groups.

This motivates an actionable question:

If resources are limited, where and for whom should prevention and mental-health services be prioritised to reduce risk most effectively?

Datasets¶

  • WHO GHO API
    • MH_12 - Age-standardised suicide rates (per 100 000 population)
    • SDGSUICIDE - Crude suicide rates (per 100 000 population)
  • data.gov.sg API
    • HDB - Elderly and Future Elderly Resident Population by Geographical Distribution
    • URA - Master Plan 2019 Subzone Boundary (No Sea) (GEOJSON)

Macro Plots¶

Age-standardised suicide rates (per 100 000 population)¶

In [36]:
data = gho_data_with_details(code="MH_12")

required_fields = ["TimeDim", "Dim1", "NumericValue", "Low", "High"]

df = pd.DataFrame(data=data)
df = df[required_fields]
# Filter to only show data for both sexes
df = df[df["Dim1"] == "SEX_BTSX"]
# Sort by year
df = df.sort_values(["TimeDim"])
df = df.reset_index(drop=True)

# print("Filter to only show data where Dim1=SEX_BTSX (both sexes)")
# print(f"{len(df)}/{len(data)} data points (showing first 10)")
# df.head(10)
Dataset MH_12: Age-standardized suicide rates (per 100 000 population)
In [37]:
fig = go.Figure()

# Uncertainty band
fig.add_trace(
    go.Scatter(
        x=df["TimeDim"],
        y=df["High"],
        mode="lines",
        line=dict(width=0),
        showlegend=False,
        hoverinfo="skip",
    )
)

fig.add_trace(
    go.Scatter(
        x=df["TimeDim"],
        y=df["Low"],
        mode="lines",
        line=dict(width=0),
        fill="tonexty",
        fillcolor="rgba(144, 238, 144, 0.4)",
        name="Uncertainty (Low–High)",
    )
)

# Main line
fig.add_trace(
    go.Scatter(
        x=df["TimeDim"],
        y=df["NumericValue"],
        mode="lines+markers",
        name="Population Average (Both Sexes)",
    )
)

# Only show years with data points
years = sorted(df["TimeDim"].unique())
fig.update_xaxes(tickmode="array", tickvals=years)

fig.update_layout(
    title="Singapore: Age-standardised Suicide Rate (Both Sexes)",
    xaxis_title="Year",
    yaxis_title="Suicide rate per 100,000 population",
    template="simple_white",
    hovermode="x unified",
)

fig.show()

A line chart with an uncertainty band was chosen because it clearly communicates long-term trends while acknowledging uncertainty in the estimates. This allows sustained movements to be distinguished from short-term fluctuations.

This macro-level trend is important because it provides a national context for understanding suicide as a public health issue and helps assess whether population-wide interventions have coincided with changes over time.

The declining age-standardised suicide rate suggests overall improvement in mental health outcomes at the population level, which may reflect advances in healthcare access, economic stability, and public awareness.

For policymakers, this indicates that broad strategies may be having an effect, but it does not identify which demographic groups continue to face elevated risk.

A key limitation of this dataset is that age-standardisation masks heterogeneity across age and sex groups, meaning that subgroup vulnerabilities may be hidden despite favourable national averages.

Micro Plots¶

Crude suicide rates (per 100 000 population)¶

In [38]:
data = gho_data_with_details(code="SDGSUICIDE")

required_fields = ["TimeDim", "Dim1", "Dim2", "NumericValue", "Low", "High"]

valid_age_groups = [
    "AGEGROUP_YEARS10-19",
    "AGEGROUP_YEARS20-29",
    "AGEGROUP_YEARS30-39",
    "AGEGROUP_YEARS40-49",
    "AGEGROUP_YEARS50-59",
    "AGEGROUP_YEARS60-69",
    "AGEGROUP_YEARS70PLUS",
]

df = pd.DataFrame(data=data)
df = df[required_fields]
df = df[df["Dim2"].isin(valid_age_groups)]
# Filter to only show data from 2021 (the only year that has data across age groups)
df = df[df["TimeDim"] == 2021]
# Sort by age groups
df = df.sort_values(["Dim1", "Dim2"])
df["Sex"] = (
    df["Dim1"]
    .str.replace("SEX_BTSX", "Both Sexes")
    .str.replace("SEX_MLE", "Male")
    .str.replace("SEX_FMLE", "Female")
)
df["AgeGroupLabel"] = (
    df["Dim2"].str.replace("AGEGROUP_YEARS", "").str.replace("PLUS", "+")
)
# Drop unnecessary columns
df = df.drop(columns=["Dim1", "Dim2", "Low", "High"], inplace=False)
df = df.reset_index(drop=True)

# print("Filter to only show data where from 2021")
# print(f"{len(df)}/{len(data)} data points (showing first 10)")
# df.head(10)
Dataset SDGSUICIDE: Crude suicide rates (per 100 000 population)
In [39]:
fig = go.Figure()

subset = df[df["Sex"] == "Male"]
fig.add_trace(
    go.Bar(
        x=subset["AgeGroupLabel"],
        y=subset["NumericValue"],
        name="Male",
        marker_color="steelblue",
    )
)

subset = df[df["Sex"] == "Female"]
fig.add_trace(
    go.Bar(
        x=subset["AgeGroupLabel"],
        y=subset["NumericValue"],
        name="Female",
        marker_color="indianred",
    )
)

fig.update_layout(
    title="Singapore: Suicide Rates by Age Group and Sex (2021)",
    xaxis_title="Age group",
    yaxis_title="Suicide rate per 100,000 population",
    barmode="group",
    template="simple_white",
    hovermode="x unified",
)

fig.show()

A grouped bar chart was selected as it enables direct comparison of suicide rates across age groups while highlighting differences between males and females within each group.

This micro-level analysis is important because suicide risk is known to vary substantially across age and sex, and identifying high-risk groups is essential for targeted prevention.

The chart shows that suicide rates increase with age and are consistently higher among males, with older men exhibiting the highest rates, indicating a concentrated vulnerability among elderly populations.

For clinicians and policymakers, this highlights the need for age and sex specific interventions, particularly focused on older adults who may face compounding risks such as social isolation, chronic illness, and reduced mobility.

A limitation of this dataset is that it reflects a single year of data and does not capture longitudinal changes or causal factors underlying these observed differences.

Actionable Plots¶

HDB Elderly and Future Elderly Resident Population by Geographical Distribution¶

In [40]:
dataset_id = "d_4180067b350bc9839a4cea487841d5d1"
headers = {"x-api-key": os.getenv("DATA_GOV_SG_API_KEY")}
base_url = "https://data.gov.sg/api/action/datastore_search"

limit = 100
offset = 0

url0 = f"{base_url}?resource_id={dataset_id}&limit={limit}&offset={offset}"
j0 = fetch_with_retry(url0, headers=headers)

total = j0["result"]["total"]
data = j0["result"]["records"]

print("Total rows:", total)

offset += limit
while offset < total:

    url = f"{base_url}?resource_id={dataset_id}&limit={limit}&offset={offset}"

    j = fetch_with_retry(url, headers=headers)

    batch = j["result"]["records"]

    if not batch:
        break

    data.extend(batch)
    offset += limit

    time.sleep(2)

print("Fetched rows:", len(data))

df = pd.DataFrame(data=data)
# Normalise town estate names
df["area_key"] = df["town_estate"].str.upper().str.strip()
# Filter to only show data from 2018 (the latest year)
df = df[df["shs_year"] == "2018"]
# Sort by town estate and elderly type
df = df.sort_values(["town_estate", "elderly_pop"])
df = df.reset_index(drop=True)

# print("Filter to only show data where from 2018")
# print(f"{len(df)}/{len(data)} data points (showing first 10)")
# df.head(10)
Total rows: 208
Fetched rows: 208

Master Plan 2019 Subzone Boundary (No Sea) (GEOJSON)¶

In [41]:
dataset_id = "d_8594ae9ff96d0c708bc2af633048edfb"
headers = {"x-api-key": os.getenv("DATA_GOV_SG_API_KEY")}
url = f"https://api-open.data.gov.sg/v1/public/api/datasets/{dataset_id}/poll-download"

r = requests.get(url=url, headers=headers)
r.raise_for_status()

data = r.json()

url = data["data"]["url"]

r = requests.get(url=url, headers=headers)
r.raise_for_status()

geo_data = json.loads(r.text)

# Normalise GeoJSON town estate names
for feature in geo_data["features"]:
    feature["properties"]["area_key"] = (
        feature["properties"]["PLN_AREA_N"].upper().strip()
    )
In [42]:
df_areas = set(df["area_key"])
geo_areas = {feature["properties"]["area_key"] for feature in geo_data["features"]}

print("In data but not map:", df_areas - geo_areas)
print("In map but not data:", geo_areas - df_areas)

print(
    "\nAfter inspecting both datasets, WHAMPOA does not exist in the GeoJSON.\nTherefore, we replace KALLANG/WHAMPOA with KALLANG for better representation.\n"
)

df["area_key"] = df["area_key"].replace({"KALLANG/WHAMPOA": "KALLANG"})

df_areas = set(df["area_key"])
geo_areas = {feature["properties"]["area_key"] for feature in geo_data["features"]}

print("In data but not map:", df_areas - geo_areas)
print("In map but not data:", geo_areas - df_areas)
In data but not map: {'KALLANG/WHAMPOA', 'CENTRAL AREA'}
In map but not data: {'WOODLANDS', 'LIM CHU KANG', 'SINGAPORE RIVER', 'TUAS', 'CHANGI', 'SOUTHERN ISLANDS', 'NEWTON', 'SUNGEI KADUT', 'CHANGI BAY', 'DOWNTOWN CORE', 'MANDAI', 'KALLANG', 'MARINA EAST', 'RIVER VALLEY', 'PIONEER', 'MARINA SOUTH', 'SELETAR', 'BOON LAY', 'WESTERN ISLANDS', 'ORCHARD', 'NORTH-EASTERN ISLANDS', 'NOVENA', 'OUTRAM', 'STRAITS VIEW', 'CENTRAL WATER CATCHMENT', 'ROCHOR', 'TANGLIN', 'WESTERN WATER CATCHMENT', 'TENGAH', 'SIMPANG', 'PAYA LEBAR', 'MUSEUM'}

After inspecting both datasets, WHAMPOA does not exist in the GeoJSON.
Therefore, we replace KALLANG/WHAMPOA with KALLANG for better representation.

In data but not map: {'CENTRAL AREA'}
In map but not data: {'WOODLANDS', 'LIM CHU KANG', 'SINGAPORE RIVER', 'TUAS', 'CHANGI', 'SOUTHERN ISLANDS', 'NEWTON', 'SUNGEI KADUT', 'CHANGI BAY', 'DOWNTOWN CORE', 'MANDAI', 'MARINA EAST', 'RIVER VALLEY', 'PIONEER', 'MARINA SOUTH', 'SELETAR', 'BOON LAY', 'WESTERN ISLANDS', 'ORCHARD', 'NORTH-EASTERN ISLANDS', 'NOVENA', 'OUTRAM', 'STRAITS VIEW', 'CENTRAL WATER CATCHMENT', 'ROCHOR', 'TANGLIN', 'WESTERN WATER CATCHMENT', 'TENGAH', 'SIMPANG', 'PAYA LEBAR', 'MUSEUM'}

Elderly Population (65+ Years Old) by Town Estate¶

In [43]:
df_elderly = df.copy()
df_elderly = df_elderly[df_elderly["elderly_pop"] == "Elderly"]
df_elderly["number"] = df_elderly["number"].astype(int)
df_elderly = df_elderly.sort_values(["number"], ascending=False)
df_elderly = df_elderly.reset_index(drop=True)

print(f"Elderly population across {len(df_elderly)} town estates")

plot_geojson(
    geo_data=geo_data,
    df=df_elderly,
    title="Elderly Population (65+ Years Old) by Town Estate",
)

df_elderly.head(10)
Elderly population across 26 town estates
Out[43]:
_id shs_year elderly_pop town_estate number area_key
0 173 2018 Elderly Bedok 38200 BEDOK
1 174 2018 Elderly Bukit Merah 34700 BUKIT MERAH
2 160 2018 Elderly Jurong West 33400 JURONG WEST
3 175 2018 Elderly Ang Mo Kio 31000 ANG MO KIO
4 161 2018 Elderly Tampines 31000 TAMPINES
5 162 2018 Elderly Hougang 27000 HOUGANG
6 176 2018 Elderly Toa Payoh 25300 TOA PAYOH
7 163 2018 Elderly Yishun 24800 YISHUN
8 177 2018 Elderly Kallang/Whampoa 24100 KALLANG
9 164 2018 Elderly Choa Chu Kang 24000 CHOA CHU KANG

Future Elderly Population (55-64 Years Old) by Town Estate¶

In [44]:
df_f_elderly = df.copy()
df_f_elderly = df_f_elderly[df_f_elderly["elderly_pop"] == "Future Elderly"]
df_f_elderly["number"] = df_f_elderly["number"].astype(int)
df_f_elderly = df_f_elderly.sort_values(["number"], ascending=False)
df_f_elderly = df_f_elderly.reset_index(drop=True)

print(f"Future elderly population across {len(df_f_elderly)} town estates")

plot_geojson(
    geo_data=geo_data,
    df=df_f_elderly,
    title="Future Elderly Population (55-64 Years Old) by Town Estate",
)

df_f_elderly.head(10)
Future elderly population across 26 town estates
Out[44]:
_id shs_year elderly_pop town_estate number area_key
0 187 2018 Future Elderly Tampines 41000 TAMPINES
1 186 2018 Future Elderly Jurong West 35800 JURONG WEST
2 199 2018 Future Elderly Bedok 32300 BEDOK
3 189 2018 Future Elderly Yishun 30000 YISHUN
4 190 2018 Future Elderly Choa Chu Kang 28800 CHOA CHU KANG
5 188 2018 Future Elderly Hougang 27200 HOUGANG
6 191 2018 Future Elderly Hougang 26600 HOUGANG
7 201 2018 Future Elderly Ang Mo Kio 22600 ANG MO KIO
8 200 2018 Future Elderly Bukit Merah 22500 BUKIT MERAH
9 183 2018 Future Elderly Sengkang 21200 SENGKANG

A choropleth map is an appropriate design choice as it allows population density to be visualised geographically, making it easier to identify town estates where large numbers of future elderly and elderly residents are clustered.

This spatial analysis is important because suicide risk is highest among older adults, yet mental health resources are often planned uniformly rather than in response to where ageing populations are concentrated.

When interpreted alongside age-specific suicide risk, the maps highlight estates such as Tampines, Jurong West, Bedok, Yishun, and Hougang as areas where both current (65+) and future (55–64 transitioning into 65+) demand for mental health services is likely to be greatest.

For policymakers and clinicians, this supports targeted deployment of mental health services, community outreach, and early screening programmes in specific locations rather than broad nationwide expansion.

A key limitation of this dataset is that population size does not directly measure mental health need or increased suicide risk. It does not account for other factors such as social support, existing service capacity, or individual-level risk. Therefore, it should be interpreted as an indicator of potential service demand rather than actual clinical burden.

Conclusion¶

This analysis demonstrates that while Singapore’s overall age-standardised suicide rate has declined over time, the aggregated improvement conceals substantial demographic and spatial heterogeneity in risk.

Disaggregation by age and sex reveals that older adults, particularly elderly males, remain disproportionately vulnerable, underscoring the importance of targeted rather than general interventions.

By mapping the spatial concentration of future elderly and elderly populations, we can translate these demographic insights into actionable guidance and identify town estates where demand for mental health services is likely to be highest both currently and in the near future.

Together, the macro, micro, and spatial perspectives highlight the value of integrating epidemiological trends with demographic and geographic data to support proactive mental health planning.

A key limitation of this study is data timeliness. The suicide rate estimates are only available up to 2021, while population data are available up to 2018. More recent datasets would enable more accurate and responsive analysis, particularly in light of recent social and economic changes.

AI Declaration¶

I used ChatGPT to improve expressions of sentences to refine my assignment. I am responsible for the content and quality of the submitted work.